# Readme

* This dataset contains one database for each studied project.
* The structure (schema) of the databases is identical.
* **Note** the databases are compressed with `bzip2` algorithm.
  Decompress with the following command

  ```bash
  $ bzip2 -d *.sqlite3.bz2
  ```

## Database Schema

### Table `issue`

* Represents an issue from the issue tracker
* Primary key: `issue_id`
* Remaining columns are attributes retrieved from issue tracker, such as
  * Issue Type,
  * Assignee, Reporter
  * etc
* Special column ```pp_description``` is the issue description with applied pre-processing (see Section 4 of the paper)

### Table `issue_comment`

* Represents a comment from the issue tracker
* Primary key: `comment_id`
* Foreign key: `issue_id`
* Remaining columns are attributes retrieved from issue tracker
  * `username`: user login name
  * `display_name`: user display name
  * `created_date`: comment creation timestamp (UTC)
  * `message`: comment content

## Additional Material

* The table `bots.xslx` provides details which bots are found in every project and the amount of comments they have generated.
* The table `collaboration_per_issue_type.xslx` shows the different conversation types across all projects in relation to issue type.
